Information Retrieval Techniques for Templated Queries
نویسندگان
چکیده
Queries in template form are gaining in popularity as a means of conveying specific information needs to search engines. We explore the utility of Information Retrieval (IR) techniques in the context of templated queries. Our investigations show that IR techniques known to be well-suited for ad hoc retrieval don’t seamlessly extend to the case of templated queries. We show that what works is a combination of IR techniques and intuition-driven modifications to the templated queries, resulting in statistically significant improvements over the baseline. Introduction Queries in template form are a new and emerging paradigm in the way users can convey complex information needs to search engines. Templated queries find most use when the same type of information is repeatedly queried about. The following templated query from the TREC 2006 complex Interactive Question Answering (ciQA) (Dang et al.,2006) track is one such example. It consists of two parts – the template itself and a narrative. A user interested in the general topic of transfer of certain goods from one location to another can use this template by simply instantiating the free slots with the goods and locations she is interested in. To further expatiate upon the information need, the user can also include freeform text in the narrative section. Once the user has instantiated a template, the next task is to convert this templated query into one understandable by a search engine. Using the entire query in its original form can result in poor effectiveness due to the presence of extraneous terms like analyst, specifically, and international community. Hence there is need for an intermediary system, or a query processor, to convert this complex information need into an effective query in the search engine’s query language. In creating this effective query, additional information about the template itself can be leveraged. For example, for the template given above, we can create a query that specifies that the locations Bonaire and United States should co-occur in a document, and that the term drugs should be expanded with a list of drugs. The possibility of incorporating template-specific features sets templated querying apart from the domain of free-form searching. Some earlier analogies to templated queries include the advanced search options available in most commercial search engines. These options allowed users to select date ranges, document types, URL restrictions and so on. Using these options the user could focus and clarify their queries. However, use of advanced search options is not very popular (Spinks et al., 1999). Template: What evidence is there for transport of [drugs] from [Bonaire] to [the United States]? [G] [L1] [L2] Narrative: The analyst would like to know of efforts made to discourage narco traffickers from using Bonaire as a transit point for drugs to the United States. Specifically, the analyst would like to know of any efforts by local authorities as well as the international community. Non-commercial search engines like INQUERY (Turtle & Croft, 1991) and Indri provide a framework for posing complex information needs using a structured query language. Using the structured query language, a user can create queries indicating phrases, patterns of text, terms within certain proximity, synonyms, term absence, term presence and so on. However harnessing the full power of structured query languages requires thorough knowledge of not only the query language, but implementation details of individual features. Our focus in this paper is on the templated queries defined as part of the Defense Advanced Research Projects (DARPA) Global Autonomous Language Exploitation (GALE) program. The goal of this program is to create a system that will quickly return specific information relating to a user’s information need, from broadcast and newswire sources. The sources could be in English, Chinese or Arabic languages. Creating such a system requires the amalgamation of technologies relating to Machine Translation, Automatic Speech Recognition, Information Retrieval, Information Extraction and Text Summarization. While the final system was required to output snippets of relevant text, our goal in this paper is centered on the Information Retrieval aspect retrieval of relevant documents with high precision to serve as high-quality input for the downstream processes of text summarization and snippet extraction. It is worth noting that we could have used ciQA or the TREC Genomics Track as a source of templated queries. However, both tracks have far fewer and less diverse sets of queries. Within GALE’s first evaluation, the information needs of end-users of the final system were conveyed through one of ten possible templates. In this paper, we focus on seven of them, listed in Table 1. The lack of adequate number of training and test queries led us to drop the remaining three. Template Number Template 1 LIST FACTS ABOUT EVENTS DESCRIBED AS FOLLOWS: [event ] 2 PRODUCE A BIOGRAPHY OF [person] 3 PROVIDE INFORMATION ON [organization] 4 FIND STATEMENTS MADE BY OR ATTRIBUTED TO [person] ON [topic(s)] 5 DESCRIBE THE PROSECUTION OF [person] FOR [crime] 6 HOW DID [country] REACT TO [event]? 7 IDENTIFY PERSONS ASSOCIATED WITH [organization] WHO HAVE BEEN INDICTED ALONG WITH HOW THEY'RE RELATED Table 1. The seven GALE templates. In this paper we attempt to blend the utility of templated queries with the expressive power of structured query languages. We apply a variety of IR techniques to create structured queries from the templated ones. We show that not all IR techniques are suited for satisfying the information needs conveyed through templated queries, and devise combinations of IR techniques and manual formulations to build effective structured queries. 1 http://www.lemurproject.org/indri 2 http://www.darpa.mil/ipto/programs/gale/index.htm Template Structure In addition to the query statement with placeholders for specific instantiations, templated queries in the GALE program also contain some or all of a number of types of additional restrictions and supportive information. These include date, location and source constraints, additional (related) terms the user though were useful, and terms/phrases to be treated as equivalent. With the GALE XML query in Figure 1 as an example, we will now elaborate on these elements. Figure 1 : An example GALE XML templated query 1. Query Date The query date can be one of two types – Source or Activity. By specifying the date to be of type ‘Source’, the user can indicate that she is interested in documents that were published or FIND STATEMENTS MADE BY OR ATTRIBUTED TO [Alexander Downer] ON [splits within the EU over Iraq] Alexander Downer splits within the EU over Iraq 04 April 2003 01 May 2003 Canberra ALH VOA CNN Australian Foreign Minister Alexander Downer EU European Union divisions challengess security 1
منابع مشابه
UMass at TREC ciQA 2006
The characteristics of the ciQA Track namely the short templated queries and the scope for user interaction were the motivating factors for our interest in participating in the track. Templated queries represent a new paradigm of information-seeking more suited for specialized tasks. While work has been done in document retrieval for templated queries as part of the Global Autonomous Language E...
متن کاملDeveloping a BIM-based Spatial Ontology for Semantic Querying of 3D Property Information
With the growing dominance of complex and multi-level urban structures, current cadastral systems, which are often developed based on 2D representations, are not capable of providing unambiguous spatial information about urban properties. Therefore, the concept of 3D cadastre is proposed to support 3D digital representation of land and properties and facilitate the communication of legal owners...
متن کاملInvestigating the Impact of Authors’ Rank in Bibliographic Networks on Expertise Retrieval
Background and Aim: this research investigates the impact of authors’ rank in Bibliographic networks on document-centered model of Expertise Retrieval. Its purpose is to find out what kind of authors’ ranking in bibliographic networks can improve the performance of document-centered model. Methodology: Current research is an experimental one. To operationalize research goals, a new test colle...
متن کاملPublic Transport Ontology for Passenger Information Retrieval
Passenger information aims at improving the user-friendliness of public transport systems while influencing passenger route choices to satisfy transit user’s travel requirements. The integration of transit information from multiple agencies is a major challenge in implementation of multi-modal passenger information systems. The problem of information sharing is further compounded by the multi-l...
متن کاملQuery-Time Optimization Techniques for Structured Queries in Information Retrieval
QUERY-TIME OPTIMIZATION TECHNIQUES FOR STRUCTURED QUERIES IN INFORMATION RETRIEVAL
متن کامل